Skip to content

Add support for custom BIO method#1000

Open
noteflakes wants to merge 11 commits intoruby:masterfrom
noteflakes:ruby_bio
Open

Add support for custom BIO method#1000
noteflakes wants to merge 11 commits intoruby:masterfrom
noteflakes:ruby_bio

Conversation

@noteflakes
Copy link
Contributor

@noteflakes noteflakes commented Jan 23, 2026

This PR adds support for using a custom BIO method for performing SSL/TLS I/O. It is an alternative to #736 and a possible fix for #731.

Summary

Currently, the openssl gem uses a socket BIO (over non-blocking sockets) that bypasses the Ruby I/O layer, except for calling io_wait_readable/io_wait_writable to wait for I/O readiness. This prevents or makes it difficult to do encrypted I/O over alternative transports such as proxy connections (#731) or virtual sockets, for example in a testing situation.

I've also been looking at providing a better way to integrate the openssl gem with a fiber scheduler and specifically being able to use it in conjunction with a low-level API for performing I/O using io_uring that I'm developing.

The aim of this PR is to provide a minimal API that allows setting a custom BIO method that either uses the stock IO#read and IO#write methods to perform I/O, or alternatively use custom procs to perform read and write operations, which will allow complete freedom for performing I/O using a low-level API, a proxy connection or any virtual interface.

The proposed solution is based on the following design principles:

  • Keep the existing I/O implementation as the default behavior.
  • Minimize API changes.
  • Don't touch the implementation of the different SSLSocket I/O methods in ossl_ssl.c: #read, #write, #connect, #accept etc.
  • Minimize copying of buffers.
  • Maximize flexibility for custom BIO methods.

API

We add two methods:

  • SSLSocket#bio_method: get the socket's BIO method.

  • SSLSocket#bio_method=: set the socket's BIO method.

The setter method accepts the following:

  • nil: use the default socket BIO

  • IO: use the given IO instance to perform I/O, via its #read and #write methods. This will usually be the underlying socket object. Example usage:

ssl.bio_method = ssl.to_io
  • Object: use the given object to perform I/O, using the same interface as for an IO. Example usage:
class MyIO
  def read(len)
    'foo'
  end

  def write(str)
    str.bytesize
  end
end

ssl.bio_method = MyIO.new
  • [read_proc, write_proc]: use the given pair of read/write procs to perform IO.

The read proc takes an IO::Buffer and a maximum read length, and should return the number of bytes read. The write proc takes an IO::Buffer and a write length, and should return the number of bytes written. Example usage:

 io = ssl.to_io
 ssl.bio_method = [
   ->(buf, maxlen) {
     str = io.read(maxlen)
     len = str.bytesize
     buf.set_string(str)
     len
   },
   ->(buf, len) {
     str = buf.get_string(0, len)
     io.write(str)
   }
 ]

Implementation

Since we're just changing the BIO associated with the SSLSocket instance, we don't need to touch the I/O implementation in functions such as ossl_ssl_read_internal. The custom BIO interface will never return SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE, and thus the I/O operation will complete immediately after the call to e.g. SSL_read.

Since the custom BIO read and write hooks receive raw char * buffers, we need to pass the buffer as either String (in the case of IO/Object method), or IO::Buffer (in the case of read/write procs). The advantage of using a IO::Buffer is that there's no need to copy data between the raw buffer and the string (or vice versa). Hopefully, in the future, IO#read and IO#write would be able to accept an IO::Buffer as well as String as buffer.

Performance

A preliminary benchmark (source) shows a significant advantage to using a custom BIO method. This benchmarks measures the performance of the default socket BIO, the IO custom BIO method, and a custom BIO method using the UringMachine low-level API.

ruby 4.0.1 (2026-01-13 revision e04267a14b) +YJIT +PRISM [x86_64-linux]
Warming up --------------------------------------
             default     2.108k i/100ms
             BIO: IO     2.925k i/100ms
             BIO: UM     2.801k i/100ms
Calculating -------------------------------------
             default     26.535k (± 2.8%) i/s   (37.69 μs/i) -    132.804k in   5.009171s
             BIO: IO     31.038k (± 5.2%) i/s   (32.22 μs/i) -    155.025k in   5.009282s
             BIO: UM     27.835k (± 5.7%) i/s   (35.93 μs/i) -    140.050k in   5.050643s

Comparison:
             default:    26534.7 i/s
             BIO: IO:    31038.5 i/s - 1.17x  faster
             BIO: UM:    27834.9 i/s - same-ish: difference falls within error

Of course, the performance implications need to be investigated more thoroughly and may vary by OS, OpenSSL version, machine and network setup etc, and also concurrency.

OpenSSL and Ruby Compatibility

The implementation depends on the availability of BIO_meth_new and associated functions, which were added in OpenSSL 1.1.0.

The implementation also depends on the availability of the IO::Buffer c API, namely rb_io_buffer_new (available since Ruby 3.1) and rb_io_buffer_free_locked (available since Ruby 3.3).

Future work

  • More tests, more benchmarks.
  • Add bio_method kwarg to SSLSocket.new/SSLSocket.open (see also Add sync_close kwarg to SSLSocket.new #996).
  • Eventually, if no problems are encountered, set the default BIO method to use the underlying socket and its #read and #write methods.

cc @HoneyryderChuck @ioquatix @rhenium

This adds support for using a custom BIO method for performing SSL/TLS
I/O through a Ruby IO instance (normally the underlying socket).
Alternatively, a pair of read and write procs may be used for performing
the I/O.
@rhenium
Copy link
Member

rhenium commented Jan 23, 2026

I have a WIP branch in #736 which takes a similar approach of providing a custom BIO_METHOD implementation, but it accepts a Ruby object that provide IO-like non-blocking read/write methods: #read_nonblock(integer, exception: false) and #write_nonblock(string, exception: false). Since I was expecting the main use case to be nested TLS, this felt like the most natural choice as the underlying IO interface.

Since we're just changing the BIO associated with the SSLSocket instance, we don't need to touch the I/O implementation in functions such as ossl_ssl_read_internal. The custom BIO interface will never return SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE, and thus the I/O operation will complete immediately after the call to e.g. SSL_read.

Wouldn't that convert all methods on SSLSocket blocking, including #{read,write,accept,connect}_nonblock? I feel support for these non-blocking methods is important, as many existing users currently rely on them to implement high-level timeout.

  • Don't touch the implementation of the different SSLSocket I/O methods in ossl_ssl.c: #read, #write, #connect, #accept etc.

Some changes will be necessary there for proper error handling. Because Ruby exceptions are implemented using longjmp in C extensions, failing to handle them properly can corrupt OpenSSL's internal state or cause memory leaks.

If the underlying BIO raises an exception, we must temporarily catch it, allow OpenSSL to clean up its internal state, wait for SSL_read() to return, and then re-raise the exception from SSLSocket#sysread. Unfortunately, the ~10 different callbacks settable on SSLContext complicate this because they can be called at different timings. #736 is currently blocked because it likely doesn't handle all edge cases correctly yet (and I haven't had time to review them carefully, though I've been hoping to finish some time).

@noteflakes
Copy link
Contributor Author

noteflakes commented Jan 26, 2026

Wouldn't that convert all methods on SSLSocket blocking, including #{read,write,accept,connect}_nonblock? I feel support for these non-blocking methods is important, as many existing users currently rely on them to implement high-level timeout.

Indeed, this implementation would change the behavior to blocking, at least when using an IO-like object for the underlying BIO. However, for the custom BIO option (providing a read and a write proc), I have just added support for returning :wait_readableand:wait_writable`, which makes it possible to use the non-blocking API. Test case: https://github.com/ruby/openssl/pull/1000/changes#diff-9b8fe96fbb75f887aaa0cb69b982c886e041707496f37d8d6a5b443d92a5c347R2504

I understand the importance of supporting the non-blocking API. At the same time, having everything implemented as non-blocking is preventing other use cases, such as better integration with a fiber scheduler, or using io_uring for the actual I/O. Allowing a blocking BIO method is also an opportunity for achieving better performance, especially in conjunction with io_uring.

If the underlying BIO raises an exception, we must temporarily catch it, allow OpenSSL to clean up its internal state, wait for SSL_read() to return, and then re-raise the exception from SSLSocket#sysread.

I have included a test case for raising an exception from within the BIO method, that seems to be working correctly. However, this may need more testing. https://github.com/ruby/openssl/pull/1000/changes#diff-9b8fe96fbb75f887aaa0cb69b982c886e041707496f37d8d6a5b443d92a5c347R2460

Unfortunately, the ~10 different callbacks settable on SSLContext complicate this because they can be called at different timings. #736 is currently blocked because it likely doesn't handle all edge cases correctly yet (and I haven't had time to review them carefully, though I've been hoping to finish some time).

Looking at the code in ossl_ssl.c, it might be interesting to remove the calls to rb_protect in the different callback implementations, and instead wrap functions such as ossl_start_ssl with a single rb_protect. After looking more closely, this is more complicated than I first thought.

@noteflakes
Copy link
Contributor Author

I'm closing this PR as it doesn't provide a good solution for supporting non-blocking I/O.

@noteflakes noteflakes closed this Feb 4, 2026
@ioquatix
Copy link
Member

ioquatix commented Feb 4, 2026

I don't think we should throw this out just because it doesn't support non-blocking.

If the only use case for non-blocking IO is users implementing timeouts, that's already better handled by IO#timeout.

As an aside, I personally think read_nonblock and write_nonblock was a big mistake for a user facing interface. So I don't think we should encourage it or use it as a standard internal interface. For example, using exceptions for waiting was an extremely poor design choice.

@noteflakes
Copy link
Contributor Author

I agree the non-blocking API certainly complicates things but now it's just something we need to deal with. Maybe a solution would be to disable the non-blocking methods when a custom BIO is used? I'm open to suggestions, it's not clear to me how to proceed with this PR.

@HoneyryderChuck since you're behind the original issue #731 would you like to give your take on this PR? Can this solve your specific use case?

@noteflakes noteflakes reopened this Feb 4, 2026
@HoneyryderChuck
Copy link
Contributor

@noteflakes sorry for the late reply, haven't had much time to come back to the pending openssl related threads.

I like that your proposal is quite simple. Still, it seems that, in order to get non-blocking behaviour, I'd have to run on top of a fiber scheduler that hooks on the write/read blocking APIs, and that's unfortunately not the lowest common denominator in ruby. My original request had nothing to do with that though, I justed wanted to enable HTTPS proxy mode in httpx, the HTTP client library I maintain. And while it works much better (since some changes done last year) unde fiber schedulers, it's still mostly used without them, and its multiple request API relies on its own internal event loop, which relies on the nonblocking read/write APIs. That means that, for the most common use case, @rhenium 's work fits the most common case much better for me.

I still think that your angle of allowing a "bring your own BIO method" could be something worthwhile, particularly because in your case, you're targeting a liburing based fiber scheduler, so ideally openssl should accomodate that optionally for the cases of openssl sockets used under a thread running your scheduler; shipping your own BIO method on top of your fiber scheduler like a way to address it.

As an aside, I personally think read_nonblock and write_nonblock was a big mistake for a user facing interface.

I can understand the argument, but I think it's too late. Most (all?) network libs in ruby use the nonblocking variants. I haven't made a lot of use of IO.timeout= yet, so I'm not sure how much of an alternative it is, but on a first glance, it'd break with higher level non-fiber-scheduler-based event loops. From a pragmatic point of view, supporting the non-blocking variants here seems like a fraction of the work it'd take migrating the whole ecosystem to blocking APIs again while building fiber-scheduler variants which would be built on top of completeness-based I/O (not all OS/versions support them?) and convincing the community to use them primarily.

FWIW I'd like an openssl library built with more ruby, with a more direct layer of integration with libopenssl and friends. Most of the public interface maps to C extensions, and there's a lot of "callbacks" to CRuby API in between calls to the SSL API, which leaves some optimizations on the table, or reusing code in jruby impossible, but also make things such as implementing your own BIO method quite cumbersome. That's why I've trying to reimplement (most of) the ASN1 module in plain Ruby (something I have to get back to), and certainly something that I believe makes @rhenium patch much harder to validate (there's a lot of CRuby API in that BIO method, which makes it hard to consider the edge cases around error handling). Just my thoughts.

@noteflakes
Copy link
Contributor Author

I still think that your angle of allowing a "bring your own BIO method" could be something worthwhile, particularly because in your case, you're targeting a liburing based fiber scheduler, so ideally openssl should accomodate that optionally for the cases of openssl sockets used under a thread running your scheduler; shipping your own BIO method on top of your fiber scheduler like a way to address it.

Fortunately this is pretty easy, and that's actually what I spent the last couple of days doing:

digital-fabric/uringmachine#3

From a pragmatic point of view, supporting the non-blocking variants here seems like a fraction of the work it'd take migrating the whole ecosystem to blocking APIs again while building fiber-scheduler variants which would be built on top of completeness-based I/O (not all OS/versions support them?) and convincing the community to use them primarily.

Maybe this got lost between the comments, but this PR already includes support for non-blocking I/O when using a custom BIO. 6c1db60

certainly something that I believe makes @rhenium patch much harder to validate (there's a lot of CRuby API in that BIO method, which makes it hard to consider the edge cases around error handling). Just my thoughts.

Regarding error handling, please correct me if I'm wrong, but it seems to me that this might actually be a non-issue, at least in most circumstances. If any callback raises an exception without rescuing it, or if any I/O method fails with a SystemCallError, or an SSLError, from the point of view of the application the way to deal with this would be to simply close the connection and optionally retry. So in that sense can we say that the SSL connection state post-exception is irrelevant? If there is a need to be able to recover an SSL connection after an exception, can you give an example?

@ioquatix
Copy link
Member

ioquatix commented Feb 4, 2026

I justed wanted to enable HTTPS proxy mode in httpx

You can already do this using a pipe, just in case you wanted something that's working today.

IO.timeout= yet, so I'm not sure how much of an alternative it is, but on a first glance, it'd break with higher level non-fiber-scheduler-based event loops

It works fine with or without a fiber scheduler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants